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Maximum likelihood estimation is one of the most used methods in quantum state tomography, 
where the aim is to reconstruct the density matrix of a physical system from measurement results. 
One strategy to deal with positivity and unit trace constraints is to parameterize the matrix to be 
reconstructed in order to ensure that it is physical. In this case, the negative log-likelihood function 
in terms of the parameters, may have several local minima. In various papers in the field, a source of 
errors in this process has been associated to the possibility that most of these local minima are not 
global, so that optimization methods could be trapped in the wrong minimum, leading to a wrong 
density matrix. Here we show that, for convex negative log-likelihood functions, all local minima 
of the unconstrained parameterized problem are global, thus any minimizer leads to the maximum 
likelihood estimation for the density matrix. We also discuss some practical sources of errors. 



I. INTRODUCTION 

Preparation, manipulation and characterization of quantum systems are essential tasks for 
quantum information processing It is known that measurement on a quantum system 

projects it onto a probable state. Therefore it is impossible to determine the state of the system 
with a single copy. Moreover the non-cloning theorem ^ forbids the production of several copies 
of the unknown state for its further reconstruction. However, if one has a source of quantum 
systems which are identically prepared, it is possible to reconstruct its quantum state through 
Quantum State Tomography (QST): an experimental process where the ensemble of unknown, but 
identically prepared quantum states, is characterized by a sequence of measurements in different 
basis, allowing the reconstruction of its density matrix. 

The first approach that appeared in literature was linear inversion [5], based on inversion 
of the Born's rule, where the probabilities predicted by quantum mechanics are equated to 
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experimental normalized frequencies. However, due to experimental noise, the recovered state 
may not correspond to a physical state. To ensure that the estimated density matrix is physical, 
we need to restrict it to the space of semidefinite positive and unit trace matrices. This is a problem 
of statistical inference called constrained parameter estimation p] . 

The most commonly used method for parameter estimation, due to its good asymptotic 
properties [3 [7], is the maximum likelihood estimation (MLE). It was used in QST [S], and in 
particular for the reconstruction of states of two photonic qubits jl] . This method has been widely 
used and discussed in the literature Other methods can also be used for QST, for instance 

the ones based on Bayesian Inference, as discussed in However, in this work we focus on 

maximum likelihood estimation. 

The MLE method seeks the matrix that maximizes the probability of observed experimental 
data, restricting the search to semidefinite positive and unit trace density matrices. Therefore, 
to get the maximum likelihood estimation, we need to solve a constrained optimization problem 
OH!]. As the dimension of the matrix to be reconstructed increases, the optimization problem 
becomes more difficult. Alternative approaches to full optimization have been discussed but 
assuming some prior knowledge about the state. Other methods to obtain the MLE of density 
matrix are based on extremal equations, like RpR iteration presented in |10| . In this work, we 
study the parameterization approach that appeared in ^ 1141 and show the equivalence between 
local solutions of the associated unconstrained optimization problem. 

One way of treating the problem of the constraints is parameterizing the matrix to ensure 
positivity and unit trace, and solving the new unconstrained optimization problem with these 
parameters [1]. However, this new optimization problem has a nonconvex objective function with 
several local minima. Since MLE method requires the global minimum, we could have a problem, 
because a local non-global minimum could lead to a wrong density matrix estimation. These local 
minima were taken as possible non-global in the literature [U 1141 115| and this problem was pointed 
as a source of errors in quantum state reconstruction. Our contribution here is to show that all 
local minimizers are in fact global. Therefore, no matter what minimum the optimization method 
finds, the reconstructed density matrix is always the same. We also show that some complaints 
found in the literature, about errors in the reconstruction, may be caused by the use of optimization 
methods where global convergence |121 113j to local minimizers does not hold or stop prematurely, 
indicating the stagnation (poor progress) of the optimization process. 

This paper is organized as follow. Section |TT] reviews the standard methods of Quantum State 
Tomography: linear inversion and maximum likelihood estimation (MLE). Section III discusses 
the parameterization of the density matrix in MLE to assure that the reconstructed matrix is 
physical. The mathematical foundation needed to prove the equivalence of local minimizers for 
the unconstrained optimization problem is presented in Secti on |IV[ Section |V] presents our main 



result as a corollary of the theorem demonstrated in Section IV The spurious local minima, for 



the associated unconstrained optimization problem, arise from over-parameterization of the density 
matrix, but we show that all of them return the same global minima and the same reconstructed 
density matrix. Some examples illustrating this point and some problems that may occur using 
inappropriate optimization procedures are presented in Section VI Finally, we conclude in Section 
lYHl 



II. LINEAR INVERSION AND MLE 



In this section we review the basic ideas of linear inversion and maximum likelihood estimation. 
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A. Linear Inversion 

Let 1^1/ 1 be an orthonormal basis for Hermitian matrices space of order d, where 

tr (f,f ^) = 5,^,, , (1) 

A = ^tr(f,A)f, , VA 

Using this basis, we can write a density matrix p as 

p = Y,S,T,, (2) 

where the coefficients Si, — tr (F^p) are known as Stokes parameters. If we take measurements 
described by a POVM set {O^}, quantum mechanics teUs us that 

f, « p{p) = tr [O^p) , (3) 

where = n^/N denotes the normahzed frequencies, (n^ and N denote the number of times 
outcome fi occurs and the normahzation constant respectively.). Taking the inner product using 
Q and we have 

U^tT{0^p)=Y,S.tr{O^T„). (4) 

Consider the index p, labelhng the experimental setup/result pairs. Then taking an informationally 
complete set of measurements, we can determine the parameters S'y, and therefore p, solving the 
linear system 

Bs = /, (5) 

where s is the vector with 5^ components, / is the vector with components /p, and B is the (P x (P 
matrix with entries B,y^ — tT(OfjT,y). This method is called linear tomography reconstruction or 
linear inversion. 

Nevertheless, the experimental noise disturbs the normalized frequencies which in turn do not 
produce good approximations for probabilities tr {O^p). Moreover, the law of large numbers states 
that the normalized frequencies approximate the theoretical probabilities only if the measurements 
are repeated a large number of times, which is not always the case. This implies that the recovered 
matrix may not be a legitimate density matrix, having negative eigenvalues or not having unit 
trace. In order to overcome this problem, it is usual to apply the maximum likelihood estimation 
method PlISl. 

B. Maximum Likelihood Estimation 

In statistical inference [51 [7], the Maximum Likelihood Estimation (MLE) is a method to 
estimate an unknown parameter of a population, based on sampled data x. The idea is to 
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maximize the conditional probability P(x \ 9), that is, the maximum likelihood estimation Oj^ie is 
the value that maximizes the probability of getting the observed data. 

The MLE has a lot of good properties from the statistical point of view. This explains why it 
is widely used for point estimation. Among others, we can cite: 

• Consistency: lim P{\9n~d\ < e) = 1, Ve > 0, where 0n is the estimation for a finite sample 

n— f oo 

of size n. This means that the maximum likelihood estimator 9 is unbiased when n — > oo. 

• Asymptotic normality and efficiency: as n oo, 9 ^ N{9,I{9)^^), that is, 9 has 
a normal distribution with mean 9 and covariance matrix I{9)~^, where T{9) is the Fisher 
information matrix. In other words, as the sample size increases the MLE becomes an efficient 
estimator. 

• In a finite sample, if a minimum variance unbiased estimator exists, then the method of 
MLE chooses it; that is, MLE performs at least as well as competitors in finite samples. 

• Finally, the invariance property, that is, if 9 is the MLE for 9 then t{9) is the MLE for t{9). 
Note that, for any observable O, if p is the MLE for p, then tr (Op) is the MLE for tr {Op). 

In QST [HIHIITI], the MLE can be formulated as 



max C{p) 
p 

s.t tr(p) = 1, (6) 
P hO, 



where C{p) = P{n\p) is the probability of getting the experimental outcomes n given the 
parameter p. The function C{p) is called likelihood function. 

Usually, instead of maximizing C{p), it is easier to minimize the negative log-likelihood 
function 



F{p) ^- log C{p). 

So, the optimization problem is 

min F{p) 

p 

s.t tr(p) = 1, (7) 
p hO- 



From the optimization point of view, if F{p) is a convex function, we have a convex optimization 
problem [T^ since the feasible set S — {p \ p — p^ , tr (p) = 1, p h O} is convex. This kind of 
problem is interesting because every local minimum is a global one. 

The constraints defining the optimization problem ([T]) are known as linear matrix 
inequalities/equalities [16j and when the objective function F(p) is linear we have a semidefinite 
programming problem (SDP) [TS1[T7] but, in general, this is not the case. 
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Although we have efficient interior point methods for semidefinite programming |16l 117) . 
the reformulation of ^ into a linear SDP form requires the introduction of auxiliary variables and 
constraints that increase the dimension, and therefore the difficulty of the optimization problem. 

Here we consider a reformulation presented in [TJ I14j that converts the problem ([7| into an 
unconstrained optimization problem. 



III. PARAMETERIZED DENSITY MATRIX 



Let us remember that the matrix p must be Hermitian semidefinite positive and have unit trace. 
To satisfy the non negativity constraint, we can consider the product T^T, for any matrix T, where 
it is clear that 



{ij\T^T\i;) = {^'\i;')>0, V|i^). 
Moreover, the trace condition can be achieved by normalization: 

tr (TtT) ■ 

This type of matrix satisfies the mathematical properties needed for a density matrix. 



(8) 



(9) 



For example, if the state space has dimension d, a natural choice for the matrix T in ^ 
is an upper triangular matrix, given by 



/ ti td+i + itd+2 



Tit) 



td-2-i + itd^ \ 



t2 



td+3 + itd+4 



Vo 







td 



(10) 



This choice, presented in [T], is motivated by the fact that in a system of n qubits, we need to 
determine 4" parameters (the Stokes parameters) to estimate the density matrix. Moreover, this 
choice resembles the Cholesky factor p[H] if only positive diagonal entries are considered. 
Then, we have a parameterized version of the matrix p: 



Pit) = 



T{tyT{t) 
tr(T(t)tr(i))- 



(11) 



Using this parameterization the constraints over the matrix are satisfied. Notice this is a continuous 
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and surjective mapping from parameters space M.'f = M'^^ \ {0} to the density matrices space S. 
However, p{t) is not one-to-one. For example, consider the matrix: 

_ f 1/2 
I 1/2 



By inspection of (11), it is easy to find at least four solutions t = ( ±l/-\/2, ±1/a/2, 0, 0)'^ 



Moreover, note that any nonzero multiple of t, at Va ^ 0, leads to the same matrix. 



The parameterization (11) ensures positivity and unit trace, but now we need to minimize 



the function F{p{t)) with respect to the new parameters t^'s. Of course, the structure of this 
function depends on the probability distribution assumption for the experimental uncertainties. 

Commonly, F{p) is a convex function of p, for p (z S. However, after the parameterization 
(11), the new objective function F{p{t)) becomes a nonconvex function of t with several local 



To illustrate this fact consider the photonic state tomography as treated in In this 

case the noise is assumed to be Gaussian, as well as in many other physical systems. Then, the 
probability of getting the observed data n, given p{t), is: 



P{n\p{t)) = —l[exp 



(12) 



where is the expected value, tr^ is the standard deviation (approximately y/nj^) for the number 
of times the p-th measurement result occurs, and A^o is a normalization factor. 
According to ([3|, 



Ntr{0^p{t)). 



So, we have 



Pin\p{t)) ^ ^l[cxp 



{NtTi0^p{t))-n^ 
2mv{0^pit)) 



l21 



(13) 



(14) 



and to maximize this function is sufficient to minimize the negative log-likelihood: 

[tr(0^p(t))-/^]2 



2tr(0,,p(i)) 



1 tTi0^p{t))-U 
2^1 ^tr{0,p{t)) ^ 



(15) 



This is a nonconvex function of t, with several local minimizers. For instance, if we consider the 
one qubit tomography case for the polarization of a single photon, the matrix p{t) is: 



1 



ti 



tits + ititi 



tl + tl+tl + tl{tit3-ltit4 tl+tl + tl 



(16) 
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Making projective measurements based on {\H) ,\V) ,\D) ,\R)} [U [T3], which correspond 
respectively to the Unear polarization states horizontal, vertical, diagonal and to the right circular 
polarization state, we have: 



tT{\H){H\p) = 


{H\p{t)\H) 


ti:{\V}{V\p) - 


{V\pit)\V) 


ti:{\D){D\p) = 


{D\pit)\D) 


tT{\R){R\p) - 


{R\p{t)\R) 



t 



ti + ti + ts + ti ' 

^2 + ^3 "I" ^\ 



(17) 



tl + tl+tl+ ' 



4 

2ht3 



ti + q + ti + tj 
tl + tl+tj + tl 



Substituting these components in expression (15), clearly we obtain a nonconvex function F(t). 
The examples of Section \VJ\ illustrate this fact. 



Since F{p(t)) is a nonconvex function of t, we could have problems in the optimization 
process. If local, non-global minimizers exist, the algorithm could find a local minimizer t* that is 
not a global one and the corresponding density matrix p{t*) might not be the MLE p. 

Some papers in the field [TJ 1141 I15| consider the existence of several local minima which 
are not global. In this case, a good initial guess must be provided, so that the optimization 
procedure can achieve the global minimum. In order to overcome this difficult, those authors 
propose smart starting points for the optimization method, hoping that these points will lead to 
convergence for a global minimum. For example, in _lj, the initial guess is based on plm obtained 
from the linear inversion. However, calculation of this starting point, requires solution of a linear 
system, eigenvalues estimation, and matrix factorization. This represents a high computational 
cost and becomes prohibitive for large systems. Moreover, there is no guarantee that the global 
minimum is achieved. 

In Section |V] we prove that, contrary to what was believed, all local minimizers t of F{p{t)) are 
equivalent, in the sense that they all lead to the same reconstructed density matrix p{t). As F{p) 
we consider, not only the function ( |15[ ) of photonic tomography case, but any convex function of 
p in S. Indeed, we exploit the mathematical properties of the mapping p{t), defined by ( |11[ ), to 
prove that all local minimizers of F{p{t)) are equivalent. 



First, the Section [iy| addresses the mathematical theory needed to prove that all local minimizers 
of F{p{t)), for t e Wll, are in fact global. 



IV. THE MAPPING p{t) AND THE FEASIBLE SET S 

We start this section by considering the general problem of optimizing the convex function 
f{x), such that a; e ri, with also convex, given the parameterization x{t). We develop sufficient 
conditions to prove the equivalence between local minimizers. Then we address the optimization 
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of F{p{t)) as a particular case. 

Consider the convex optimization problem 

where / is convex on the convex set 51 C 



min f{x) 

X 

s.a x € il, 



(18) 



Let X = x(t) be a continuous and surjective mapping x 
the associated unconstrained optimization problem: 



mm ^{t) = fixit)). 



D ^ n, D C M."- and consider 

(19) 

It is clear that if t* is a global minimizer of ( |19[ ), then x* = x{t*) is also a global minimizer of 
(18). In fact, (/'(t*) < (jjit) Vi, that is, f{x(t^ < f{x{t)) Vi, and since x{t) is surjective, this 
implies that f{x*) < /(a;), Va; G il. 

However, in general, optimization methods just can find local minimizers, and some local 
minimizers of ( 19 1 could not be global. Therefore, we need that a stronger assumption holds for 
the mapping x{t) in order to show that all local minimizers of (191 are global. 

If x(t) is an homeomorphisni, onto fi, in the subsets Da C D such that D — UqUq. 
then we can prove the desired result. This condition is weaker than requiring that x(t) be a 
homeomorphism, and it is easier to hold for some mappings, in particular for mapping p(i) in QST. 

Assumption Al. For the mapping x(t) x : D ^ fl, there are sets Da C D such that 
D = UaDa and, for each Da, x : Da — ^ O is an homeomorphism. 



Theorem IV. 1. Let x{t), x : D ^ Q be continuous and surjective. If Assumption Al holds, then 
any local minimizer of is global. 



Proof. Let t* be a global minimizer of ( 19 ) and suppose there is a local, non-global, minimizer t. 
Then, f{x{i)) < fix{t)) S B{i,5) n D, but f{x{t*)) < f{x{t)). 
Let X* — x{t*) and x = x{t). Using the convexity of / in Q, we have: 

f{Xx* + (1 - \)x) < Xf{x*) + (1 - \)f{x) < Xf{x) + (1 - X)f{x) = fix), VA G (0, 1). 

Notice that x — Xx* + {1 — X)x e J7, VA G (0, 1), since 51 is a convex set, and for A sufficiently close to 
zero, X is as close to x as necessary, that is, given e > there exists A > such that x G B{x, e) nil. 

Since x(t) satisfies Assumption Al, x{t) is an homeomorphism from Da onto 51, where Da 
is such that i G Da. Thus, by continuity of the inverse of x{t), given > 0, there exists s{5q) > 
such that X G B{x, e{So)) n 51 implies i G B{t, So) n D, where i G Da, i — x^^{x). In particular for 
So < S. Therefore, there is t G B{t,6) D such that f{x{t)) < f{x{t)), which contradicts the fact 
that i is a local minimizer. □ 



Now, as a particular case of the above results we will prove the equivalence between local 
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minimizers of F{p{t)). To do this, we need to show that Assumption Al holds for p(t) defined 
by Actually, we show that p : — >■ int (5) satisfies the Assumption Al, where 

k£ = |i e Mf IUt^O, Vi = 1, . . . , d| and int (5) is the relative interior of S. 

Proposition IV. 2. The Assumption Al holds for the mapping p : M^^ — > int{S). 

Proof. First, we need to define the sets D^, j such that = Ua.jD^.j- The index j is one-to-one 
with the possible sign permutations of the diagonal elements ofT(t) {2'^ possible sign permutations), 
that is, 

Da,j = |i G I ||t||2 = a and ti, . . . , satisfing the j-sign configurationj , 
which implies that, for t G -Dqj, 

Pit) = -Titynt). 

a 

Cholesky's factorization [THl HHj states that an Hermitian matrix A is positive definite, if and 
only if there is a unique upper triangular matrix T, with positive diagonal elements such that 
A = T^T. We stress that the existence and uniqueness of the upper triangular factor T holds not 
only for positive diagonal elements, but also for any fixed order of signs of the diagonal elements 
T. Moreover, the triangular factor is continuous because each of its entries is a composition of 
continuous functions of the entries of A, [2U] . 

Thus, p : Daj int (S) defines an homeomorphism because it is continuous, bijective, 
and has continuous inverse, which is just the triangular factor with a fixed choice of signs. Since 
I^** = ^a,jDa,j, then p : Rf^ — > int (S) satisfies Assumption Af . □ 




.2 

Figure 1: Covering R,, with the Da,j sets. 
Figure [l] shows how the sets Daj are defined in the case of a 2 x 2 symmetric matrix. 
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V. MAIN RESULT 



In this section, using the Theorem |I V ■ 1 1 and Proposition |IV. 2{ we prove that all local minimizers 
of the negative log-likelihood F{p{t)) are in fact global. For any convex function the problem 

of minimizing F[p) on 5 is a convex optimization problem, where local minimizers p* are also 
global. Indeed, if we have an informationally complete set of measurements, then the minimizer 
of this convex optimization problem is unique. 

Since we have a convex minimization problem and the mapping p{t) satisfies the Assumption 
Al, as a corollary of Theorem |IV. 1| and Proposition |IV. 2[ we establish the main result of this paper. 



Corollary A. // F{p) is a convex function for p Cz S then all local minimizers of F{p{t)), for 



t e 



are also global. 



Proof. Since F{p) is convex in S then it is also co nvex in int {S). Thus, minimizing F{p) in int (S) is 
a convex optimization problem. By Proposition IV. 2 the mapping p{t), p : M^^ — )• int (S), satisfies 
Assumption Al, then Theorem IV. 1 applies, which implies that all local minimizers of F{p[t)) in 
m£ are global. □ 



The practical result provided by Corollary |V A| is that it does not matter what local minimizer 
t S IR** is found. All of them are equivalent, in the sense that they lead to the same density 
matrix p — p{t) corresponding to the maximum likelihood estimation. 

Albeit the above result considers p{t) restricted to wf^ onto int (5), it is straightforward to 
show that these are dense sets in and 5, respectively. Thus, by continuity of F{p) and F{p{t)), 
we have 

min i^(p) = inf F{p) and min F{p{t)) = inf F{p{t)). 
peint(5) temf temf. 

Therefore, in practice, it is enough to consider instead of W; . Indeed, it is enough to 
consider just one set Da.j, for instance, restricting \\t\\2 = 1 and ti > for i — l,...,d. But 
these constraints become the optimization problem harder than the unconstrained one, and the 
Corollary |V A| gives us the guarantee that any local minimizer of the unconstrained problem in 
Rf^ can be used. 

In fact, points belonging to \ M!^^ correspond to rank deficient density matrices that 
live in the boundary of S. From the numerical optimization point of view, solutions in the 
boundary, in general, are approached only in the limit. That is, numerically, a point in the interior 
that is sufficiently close to the true boundary solution is declared as the solution within certain 
tolerance criteria. Besides, the negative log-likelihood function, for example ( [l5| ), could not be 
defined (goes to infinity) in some states in the boundary, depending on the chosen POVM set. 
From the experimental point of view, matrices in the boundary of S correspond to states having 
zero eigenvalues, for example pure states, that are rarely seen in practice due to either difficulty 
of realization or by imperfections in the apparatus leading to noisy experimental data. 
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Finally, we stress that the above result remains valid for any convex function F{p) over S. 
In particular we can apply the result to the negative log-likelihood function F{p) — — \ogP{n\p) 
because whether P{n\p) is Gaussian or multinomial, the function F{p) will be convex. 



VI. NUMERICAL EXPERIENCE 



We pointed out that in some papers [11 114|, I15|, the possibility of finding local minimizers that are 
not global, in the minimization of function (15), is considered. However in Section |V]wc showed 



that all local minimizers of F{p(t)) in are also global. What may have occurred in those cases 
was the stagnation of certain methods which did not have global convergence property [TH [13] 
(this property should not be confused with convergence to global minimizers). 

Definition VI. 1. We say that an algorithm for the optimization problem 

min F(x) 

has global convergence property to stationary points ( VF(x) =0 ), if each accumulation point of 
the sequence of iterates {a^fc}^]^ generated by this algorithm is a stationary point of F{x). 

Some unconstrained optimization methods, for instance the known Nelder-Mead method pil 122) . 
have no global convergence property, that is, a stagnation point could be not even stationary. 
Also, commercial packages implement some methods that have no global convergence: for 
example, fminsearch of MATLAB, which implements the Nelder-Mead method, and FindMinimum 
of Mathematica, which implements the Powell's method [53]. Both of these algorithms, used in 
quantum state tomography, may be stopped prematurely, before reaching a local minimizer or a 
stationary point. 

Even methods that have global convergence property can still stop prematurely due to 
numerical difficulties. Thus, it is important to investigate which stopping criterion was triggered, 
in order to avoid wrong conclusions. In practice, we usually hope that an optimization algorithm 
could find, at least, an e— stationary point x* , that is, ||Vi^(a;*)|| < e. However, the optimization 
problem is sometimes not well scaled, or so difficult, that another stopping criteria must be 
employed to avoid that the iterative algorithm runs indefinitely. Normally, stagnation criteria like 
small changes in variables ||a;fc+i — a;fc|| < £x or in objective function \F(x}^+i) — F{xk)\ < £f are 
used and upper bounds are fixed for the maximum number of iterations or function evaluations. 
Since we prefer the e— stationarity criterion, the other tolerances should be smaller than e, for 
example Sx ~ £f ^ when < e ^ 1. 



Another undesirable consequence of parameterization (11) is that when ||i|| — > oo 



\\'VtF{p{t))\\ 0. So, optimization algorithms using as stopping criterion ||VF(p(t))|| < e. 



can stop in t points having large \\t\\, even if t is not a minimizer of (15). Thus, artificial bounds 
on the variables may be used in order to avoid this situation. 

The next examples show what can go wrong in QST using inappropriate methods, without 
global convergence or not numerically stable. For these examples we fix the tolerances 
£x — £f = 10"^ and the maximum number of iterations and function evaluations M = 2x 200 x n, 
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where n is the number of variables of the optimization problem and 200 x n is the default limit 
used in fminsearch routine of MATLAB. One can set these tolerances using the optimset routine. 
Unfortunately, MATLAB also uses the relative function change tolerance ep a.s the gradient norm 
tolerance e, so in the examples 1 and 2, e — 10~®. 

Example 1. Consider p — \H){H\ as the true state that we want to identify with tomography. 
The expected values for projective measurements based on {\H),\V),\D),\R)}, perturbed by 
some kind of error (computationally generated based on Gaussian distribution), plays the role of 
normalized frequencies in an experiment: Jh = 0.9990, fv = 0.0002, fo = 0.4995, Jr = 0.4994. 
Using fminsearch of MATLAB R2009a, with starting point to = (-0.0001, 0.999, 0.001, 0.999)^, 
we recover the following matrix: 

0.2219 -0.3634 + 0.0002i 

-0.3634- 0.0002i 0.7781 

The optimization procedure stopped after k = 595 iterations, with objective function value F = 
2.2316, and gradient norm ||VF|| = 0.0013. The stopping criterion verified was ||a; — s^zHoo — 
and \ f — fi \ < Ep, where / = f{x) is the function value in the best point x and /;'s are the function 
values on the other simplex points xj's. 

On the other hand, we used Isqnonlin routine for nonlinear least squares minimization, also of 
MATLAB R2009a, regarding the special structure of function (15). Providing the partial derivatives 
to compose the Jacobian matrix, the command runs a trust-region-reflective algorithm |24j . Using 
the same starting point, we get: 



0.9998 -0.0005 
-0.0005 - 0.0006i 0.0002 



0.0006i 



with optimization outputs F = 3.2 x 10^"^, ||VF|| = 5.3024 x 10"^°, after k = 22 iterations. 
Notice the difference between the gradient norms in these two cases. Clearly, the first does not 
correspond to a stationary point, nor a local minimizer. 



Another important fact is the non-unique parameterization of p{t). 
data of the above example. Using Isqnonlin from the starting point to 
obtain the solution 



i= (0.999, 0.0141, -0.0005, 0.0006)'^, 



which gives the matrix 



Pit) 



0.9998 -0.0005 
-0.0005 - 0.0006i 0.0002 



0.0006i 



Consider the same 
(1, 0.001, 0, 0)^, we 



Starting from another initial point I'q = (—1, —0.001, 0, 0)"^, the solution is 

t' = (-0.999, -0.0141, 0.0005, -0.0006)^, 
which gives the same matrix 

0.0006i 



Pit') = 



0.9998 
-0.0005 



-0.0005 
0.0006i 0.0002 
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Figure 2: Real (left) and imaginary (right) parts of density matrix of Example 2 using the derivative-free 
solver fminsearch of MATLAB. 

Both i and t' satisfy the numerical stationarity criterion ||VF|| < e. Despite the distinct local 
minimizers, the recovered matrix is the same. 

Example 2. Consider the following experimental data extracted from a complete set of 
measurements in the polarization degrees of freedom of two photons generated in a Parametric 
Down Conversion Process, where the prepared state is nearly pure: 



n-i 


= 3043 


ns = 


1546 


ng 


= 1556 


"13 = 


1070 


"-2 


= 32 


ng = 


1283 


nia 


= 122 


ni4 = 


1048 




= 2159 


nj = 


938 


nil 


= 1271 


«15 = 


1611 


n4 


19 


ns = 


1595 


ni2 


= 1621 


«16 = 


114. 



First, we use fminsearch of MATLAB R2009a, with initial guess ti = l/\/d for i < d and U = 0, 
for i > d (corresponding to the maximally mixed state ,I/d). The algorithm stops because the 
maximum number of function evaluation (2 x 16 x 200 = 6400) is reached. The recovered density 
matrix is shown in Figure 2l and presents purity tr (p^) = 0.5298. 

Second, using Isqnonlin oiMATLAB R2009a, with the same initial point, we get the density matrix 
of Figure |3j with purity tr (pf) = 0.9008. 

Example 3. To conclude the numerical examples, and to remark the meaning of Corollary |V A[ 
consider the state p = \R){R\, and its normalized frequencies fn = 0.5, fv = 0.5, fn = 0.5 and 
fn — 1. The theory presented in Section |v] tell us that the solution of minimizing F{p{t)) in 
Daj is unique, for each a and j. So, if we include the constraint \\t\\^ = 1, that is, fixing a = 1 

and choosing one fixed sign ordering among ++^-\ — , — h, , in the case of 2 x 2 matrices, then 

we have exactly one solution. To test this numerically, we define four optimization problems of 
minimizing F{p{t)) subject to WtW^ = 1 and for each problem the bound constraints: ii > and 
t2 > 0, ti > and ^2 < 0, ii < and ^2 > 0, ii < and t2 < 0. For each problem, we use a 
multistart procedure to generate 23 random initial points with components in the interval [—1, 1], 
which were used by fmincon routine to solve each constrained minimization problem. Inside each 
multistart procedure, we consider the tolerances Ex = £f = 10^^^ and that a problem is solved 
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Figure 3: Real (left) and imaginary (right) parts of density matrix of Example 2 using the nonlinear 
least-squares solver Isqnonlin of MATLAB. 

only if the first order optimality measure is less than e = 10^^, discarding solutions that do not 
match this criterion. We also considered, for each problem, a list of solutions, and after a new 
candidate solution i was obtained we include it in the list only if ||i — t\\^ > 10~^. For each choice 
of signs, each list had only one member as shown in Table |l] This numerical experiment confirms 





++ 


-1— 


-+ 






0.7071 


0.7071 


-0.7071 


-0.7071 




0.0024 


-0.0000 


0.0001 


-0.0002 




0.0000 


0.0000 


-0.0000 


0.0000 


U 


-0.7071 


-0.7071 


0.7071 


0.7071 



Table I: Solutions for each ordering of signs, after 23 random initial points, 
what was expected by the theory developed in Sections |IV| and |Vj 

These examples clarify the nature of local minimizcrs of F(p{t)) and illustrate some mistakes 
that can be made in the minimization of negative log-likelihood function when calculating the 
maximum likelihood estimation, stressing the importance of good theoretical and numerically 
stable optimization methods, as well as the analysis of stopping criteria and tolerances. 



VII. CONCLUSION 



We show that the MLE using the parameterization approach (11), when used with appropriate 
numerical optimization methods, is a reliable procedure to estimate the density matrix in Quantum 
State Tomography (QST). We prove that even though this approach implies an unconstrained 
minimization problem with several possible minimizers, all of them are global. Thus, if one uses 
a globally convergent and numerically stable minimization method, a local solution, that is also 
global, will be found regardless of any initial guess. This has important practical consequences, 
because as discussed, the utilization of smart initial guesses represents an additional computational 
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cost, and therefore becomes prohibitive for quantum state reconstruction of a large number of 
qubits. 

Moreover, contrary to what was assumed by some previous papers, our examples show 
that failures in the estimation of the optimal density matrix can be mostly due to the use of 
inappiopriatc minimization methods that do not have global convergence; property or suffer from 
numerical instability issues. This fact has probably been confused with the existence of local and 
non-global minimizers associated with the minimization process of MLE. Therefore, care must be 
taken of the stopping criteria and tolerances. 

Finally, our results regarding the parameterization p{t) and local minimizers of F{p{t)) 
remain valid for any convex function F{p). We believe that the discussion presented in this work 
is of interest for most of the quantum optics and quantum information community, and helps to 
improve the practical procedure of QST. 
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